AITopics | cpg site

Collaborating Authors

cpg site

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Testing-driven Variable Selection in Bayesian Modal Regression

Duan, Jiasong, Zhang, Hongmei, Huang, Xianzheng

arXiv.org Machine LearningOct-29-2025

We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite parameter estimation. A test statistic is constructed to exploit the shape of the model error distribution to effectively separate informative covariates from unimportant ones. Through simulations, we demonstrate and evaluate the efficacy of the proposed method in identifying important covariates in the presence of non-Gaussian model errors. Finally, we apply the proposed method to analyze two datasets arising in genetic and epigenetic studies.

artificial intelligence, covariate, machine learning, (18 more...)

arXiv.org Machine Learning

2510.23831

Country: North America > United States > South Carolina (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.95)

Add feedback

Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data

Wu, Rong, Chen, Ziqi, Li, Gen, Shu, Hai

arXiv.org Machine LearningFeb-25-2025

Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for variable selection, and (iii) generalization to more than two data views. There is a pressing need for CCA methods that integrate all three aspects to effectively analyze multi-view high-dimensional data. Results: We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in multi-view variable selection. Availability and implementation: Code is available at https://github.com/Rows21/NSGCCA.

canonical correlation analysis, hsic-sgcca, variable selection, (15 more...)

arXiv.org Machine Learning

2502.18756

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic Traits

Wu, Zipeng, Herring, Daniel, Spill, Fabian, Andrews, James

arXiv.org Artificial IntelligenceJan-4-2025

Accurately predicting chronological age from DNA methylation patterns is crucial for advancing biological age estimation. However, this task is made challenging by Epigenetic Correlation Drift (ECD) and Heterogeneity Among CpGs (HAC), which reflect the dynamic relationship between methylation and age across different life stages. To address these issues, we propose a novel two-phase algorithm. The first phase employs similarity searching to cluster methylation profiles by age group, while the second phase uses Explainable Boosting Machines (EBM) for precise, group-specific prediction. Our method not only improves prediction accuracy but also reveals key age-related CpG sites, detects age-specific changes in aging rates, and identifies pairwise interactions between CpG sites. Experimental results show that our approach outperforms traditional epigenetic clocks and machine learning models, offering a more accurate and interpretable solution for biological age estimation with significant implications for aging research.

artificial intelligence, cpg site, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.02401

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Interpretable Deep Learning Methods for Multiview Learning

Wang, Hengkang, Lu, Han, Sun, Ju, Safo, Sandra E

arXiv.org Artificial IntelligenceFeb-15-2023

Technological advances have enabled the generation of unique and complementary types of data or views (e.g. genomics, proteomics, metabolomics) and opened up a new era in multiview learning research with the potential to lead to new biomedical discoveries. We propose iDeepViewLearn (Interpretable Deep Learning Method for Multiview Learning) for learning nonlinear relationships in data from multiple views while achieving feature selection. iDeepViewLearn combines deep learning flexibility with the statistical benefits of data and knowledge-driven feature selection, giving interpretable results. Deep neural networks are used to learn view-independent low-dimensional embedding through an optimization problem that minimizes the difference between observed and reconstructed data, while imposing a regularization penalty on the reconstructed data. The normalized Laplacian of a graph is used to model bilateral relationships between variables in each view, therefore, encouraging selection of related variables. iDeepViewLearn is tested on simulated and two real-world data, including breast cancer-related gene expression and methylation data. iDeepViewLearn had competitive classification results and identified genes and CpG sites that differentiated between individuals who died from breast cancer and those who did not. The results of our real data application and simulations with small to moderate sample sizes suggest that iDeepViewLearn may be a useful method for small-sample-size problems compared to other deep learning methods for multiview learning.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.0793

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Daily Digest

#artificialintelligenceSep-26-2022, 21:57:19 GMT

The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats owing to its constant exposure to a myriad of heterogeneous compounds. Researchers developed Metabokiller, an ensemble classifier that accurately recognizes carcinogens by quantitatively assessing their electrophilicity, their potential to induce proliferation, oxidative stress, genomic instability, epigenome alterations, and anti-apoptotic response. The development of medical applications of machine learning has required manual annotation of data, often by medical experts. Yet, the availability of large-scale unannotated data provides opportunities for the development of better machine-learning models. In this Review, the authors highlight self-supervised methods and models for use in medicine and healthcare, and discuss the advantages and limitations of their application to tasks involving electronic health records and datasets of medical images, bioelectrical signals, and sequences and structures of genes and proteins.

application, daily digest, feature selection method, (1 more...)

#artificialintelligence

Industry: Health & Medicine > Health Care Technology > Medical Record (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Identifying Epigenetic Signature of Breast Cancer with Machine Learning

Vaysburd, Maxim

arXiv.org Machine LearningOct-12-2019

The research reported in this paper identifies the epigenetic biomarker (methylation beta pattern) of breast cancer. Many cancers are triggered by abnormal gene expression levels caused by aberrant methylation of CpG sites in the DNA. In order to develop early diagnostics of cancer-causing methylations and to develop a treatment, it is necessary to identify a few dozen key cancer-related CpG methylation sites out of the millions of locations in the DNA. This research used public TCGA dataset to train a TensorFlow machine learning model to classify breast cancer versus non-breast-cancer tissue samples, based on over 300,000 methylation beta values in each sample. L1 regularization was applied to identify the CpG methylation sites most important for accurate classification. It was hypothesized that CpG sites with the highest learned model weights correspond to DNA locations most relevant to breast cancer. A reduced model trained on methylation betas of just the 25 CpG sites having the highest weights in the full model (trained on methylation betas at over 300,000 CpG sites) has achieved over 94% accuracy on evaluation data, confirming that the identified 25 CpG sites are indeed a biomarker of breast cancer.

breast cancer, cancer, cpg site, (15 more...)

arXiv.org Machine Learning

1910.06899

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback